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A STUDY OF TRANSFER: SEX DIFFERENCES 
IN THE REASONING PROCESS 


MAX MARTIN KOSTICK 


State Teachers College at Boston 


A study was undertaken to determine whether sex differences in 
the transfer of training could be discovered through a statistically 
controlled experiment. Transfer for this study was defined as the 
successful application of skill or knowledge in adjusting to a new 
situation. The amount of transfer to the new situation was meas- 
ured by the difference between the actual performance and the ex- 
pected performance. The actual performance was the attainment 
after skill or knowledge to be transferred had been learned. The 
expected performance was what the attainment would have been 
had the skill or knowledge to be transferred not been learned. 

For the purpose of delimiting the scope of the investigation, 
transfer of training was restricted to the successful application of 
generalizations only. Sex differences in the amount of transfer were 
measured by comparing scores obtained in the new situation with 
scores predicted on the basis of knowledge acquired previous to 
training. 

The technique used in this investigation differed from the cur- 
rently popular techniques for measuring sex differences in two re- 
spects: first, this technique used a statistical control on previous 
knowledge; second, while the prevalent techniques for measuring 
sex differences used neuter items in preference to biased items, the 
present investigator used biased items in preference to neuter 
items. 

Because intelligence and transfer of training are intimately re- 
lated, the present writer thought it advisable to examine experi- 
ments on sex differences in intelligence. The results of this examina- 
tion revealed the general belief that there is no sex difference in 
intelligence. However, the writer feels that this belief is not based 
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upon sound evidence. The prevalent intelligence tests are invalid 
when used for the measurement of sex differences in intelligence 
because of the nature of the items. The accepted procedure for 
selecting items is by the method of ‘fairness to boys and girls’, 
This method results in items, and in turn tests, that do not and 
cannot disclose sex differences in total scores. Surely such neuter 
items are not appropriate for detecting sex differences. 

As sex differences in intelligence have not been accurately meas- 
ured, any valid information as to sex differences in the transfer of 
training acts as a wedge into the baffling problem of measuring sex 
differences in mental activities. 

The major problem, ‘Who transfer more, boys or girls?’ was 
broken down into two problems, ‘Do girls transfer more than boys?’ 
and ‘Do boys transfer more than girls?’ Each problem requires a 
special experimental design with different source materials. As a 
preparation for the selection of the source materials, numerous sur- 
veys on reading habits and reading preferences were consulted. 
These surveys indicated that boys prefer to read and actually do 
read more about Science; whereas girls prefer to read and actually 
do read more Home Economics. Accordingly, Home Economics 
material was used as source material for investigating whether 
boys transfer more than girls. Should boys be found to obtain 
higher scores on Home Economics, the higher scores could not be 
explained away on the basis of sex differences in previous 
knowledge. 

The writer investigated transfer of training under two different 
conditions when hints are given and when hints are not given. A 
total of three hundred boys and three hundred girls drawn from 
Quincy, Massachusetts; Boise, Idaho; and Charleston, South 
Carolina were used. 

The design for investigating whether boys transfer more than 
girls when no hints are given contained three types of items: (1) 
deduction items, (2) information items, (3) items on principles. The 
deduction items were used as a measure of transfer of training. The 
information items were used as a check on whether the deduction 
items were biased in favor of one sex. The items on the principles 
were used as a measure of previous knowledge. 

The items were selected in accordance with the following specifi- 


cations: 
1) Items selected for the first section of the test were based on 
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Science material. Items selected for the second section were based 
on Home Economics material. 

2) The Home Economics items were presented to several females 
for approval as to whether or not the items were as fair to girls as 
to boys. Fairness in this instance was based upon interest, utility 
and preoccupation rather than upon obtained scores. 

3) Items were chosen which seemed to be new, interesting and 
challenging and to have application in the pupils’ daily lives. 

4) The deduction items that were selected could be answered by 
the application of the principles described in the items given. These 
principles could be arrived at by induction from the helpful sug- 
gestions. 

5) All the items were written in clear, concise language that 
could be understood by pupils several grades below the eleventh. 
Thus, clarity was assured for the test group which was composed 
of pupils selected from the eleventh grade. 


SAMPLES OF ITEMS FROM SCIENCE MATERIAL 


General Information Item 


RO Gee Oe WE IIs oon cc ccc cen dcccceccccccccccccece T F 
Item on Principle 
Still air is a good conductor of heat.................. cece eens T F 


Deduction Items 
On a cold day, a loose-fitting glove will keep your hand warmer than 


Te een ah ndeeedn ae es 40d phueee ones unedeke T F 
Insulating material is made loose and fluffy so it will be easier to 
RE hb £ GL. Rie ss eed eee eden bcesenceeweehs T F 


Helpful Information that was given between pretest and post-test 
Less heat is lost through hollow building blocks than through solid ones 
because “‘still air’? does not conduct heat as well as cement does. 
SAMPLES OF ITEMS FROM HOME ECONOMICS MATERIAL 


General Information Item 


As a rule, bluing is used to make clothes appear blue........... T F 
Items on Principle 
A fiber of cloth is weakened by twisting and bending........... T F 


Deduction Items 
For better care, place fresh linen upon the linen already in the 
ER na URGE Ra NEP ep ES nr RIS eee x 2 
Washing clothes will wear them out.................000eeeeeeee T F 
Helpful Information given between pretest and post-test 
The wearing qualities of textiles is sometimes tested by counting the 
number of times the fiber of the cloth can be bent and twisted. 
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The test was administered to over twelve hundred eleventh- 
grade pupils. From this group, six hundred test papers were se- 
lected for the final statistical analysis. These papers were selected 
on the basis of age and type of high-school program. Test papers of 
pupils between the ages of fifteen and seventeen, and a propor- 
tionate number of test papers belonging to college and non-college 
preparatory pupils were used. 

The test papers selected from Quincy, Massachusetts, were 
screened further by pairing off the boys’ and girls’ papers only 
when both members of the pair had obtained the same reading 
comprehension score. Thus any sex differences in the scores on the 
deduction items, if found, could not reasonably be attributed to sex 
differences in reading comprehension. 

However, reading comprehension is only one of many compo- 
nents that might account for sex differences in the scores on deduc- 
tion items. In order to be reasonably certain that sex differences in 
scores on deduction items indicated sex differences in transfer of 
training, other possible explanations for sex differences in the 
scores must be eliminated or at least minimized. Many possible 
explanations for sex differences may be eliminated by determining 
whether or not the boys and girls used in this study were unique in 
a way that would invalidate the general conclusion that boys trans- 
fer more than girls. 

How unique is the sample group of boys and girls? This question 
was investigated by comparing boys and girls on the basis of stand- 
ardized tests, of teachers’ marks on school subjects and teachers’ 
ratings of pupils’ personality. 

The pupils in Quincy took the Pintner Intermediate Intelligence 
Test when in the tenth grade, and the mean IQ score for the girls 
(112.5) was found to be slightly higher than the mean score for the 
boys (110.0). Hence, any superiority of boys scores on deduction 
items could not reasonably be attributed to superiority in intelli- 
gence. 

On examining school marks of the experimental group, the girls 
were found to obtain slightly higher marks than the boys in both 
Mathematics and Science. However, teachers’ marks on school sub- 
jects are known to show a ‘halo effect’ in favor of girls. This ‘halo 
effect’ may be partly due to sex differences in personality. 

On examining the personality ratings, the girls were found to 
receive a slightly higher rating on ‘initiative’ and a significantly 
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higher rating on ‘applicability’. Thus, if boys received higher scores 
on deduction items, the higher scores could not be explained away 
on the basis of personality traits such as initiative or applicability. 

Only on the Arithmetic Reasoning test did boys obtain higher 
mean scores than girls and, even then, the difference between the 
scores was not significant. It may well be that Arithmetic Reason- 
ing has elements of deduction and transfer. 

The sample of pupils selected for this investigation was not found 
to be unique in a way that would invalidate the conclusion that 
boys transfer more than girls. 

In adapting a statistical design to this investigation, the major 
question, ‘‘Who transfer more, boys or girls?”’, was broken down 
to ‘‘Who transfer more when helpful suggestions are given, boys or 
girls?” and ‘‘Who transfer more when helpful suggestions are not 
given, boys or girls?’”’ This breakdown makes it possible to isolate 
the effect suggestion may have on test scores. To account for sex 
differences in reading preferences, reading habits, life goals and 
previous knowledge, the two questions were further broken down 
into four: 

1) Do girls transfer more Science material than boys when help- 
ful suggestions are not given? 

2) Do boys transfer more Home Economics material than girls 
when helpful suggestions are not given? 

3) Do girls transfer more Science material than boys when 
helpful suggestions are given? 

4) Do boys transfer more Home Economics material than girls 
when helpful suggestions are given? 


Table 1 is the design for investigating the question, “‘Do girls 
transfer more Science material than boys when helpful suggestions 
are not given?” The schema is paralleled by the actual statistical 
results. 

Do girls transfer more Science material than boys when helpful 
suggestions are not given? The answer seems to be ‘No’; at least, 
the requirements of the proof were not fulfilled. The first require- 
ment that girls should not obtain higher scores on Science informa- 
tion was fulfilled. The data do not support the view that boys have 
less knowledge of the specific type of Science material used in the 
test. 

The second requirement that girls do not obtain higher scores on 
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TABLE 1.—Do Grirus TRANSFER More Tuan Boys? 














Tests on Requirements for proof Actual results 
1) Information items | Girls should not obtain} Girls did not obtain 
(Science material) higher scores higher scores P = 
.00,00+ 
2) Items on principles | Girls should not obtain} Girls did not obtain 
(Science material) higher scores higher scores P = 
.0035 ‘ 
3) Deduction items (Sci-| Girls should obtain | Girls did not obtain 
ence material) higher scores higher scores P = 
.00,000 





principles pertinent to deduction items was also fulfilled. The find- 
ings support the view that the boys’ scores on science principles 
were significantly higher than the girls’ scores. Therefore, if girls 
did obtain higher scores on deduction items than boys, the higher 
scores could not reasonably be attributed to sex differences in 
previous knowledge. 

The third and crucial requirement that girls obtain higher scores 
on deduction items on Science was not fulfilled. The data indicate 
that the boys obtained significantly higher scores on Science 
items (P = .00,000) and, therefore, these results can not support 
the view that girls transfer more than boys. 


“Do boys transfer more Home Economics material than girls 
when helpful suggestions are not given?” may be investigated with 
a new design. Table 2 is the schema with the actual statistical 
findings. 

Do boys transfer more Home Economics material than girls 
when helpful suggestions are not given? The answer is ‘Yes’. All 
three requirements were fulfilled. The data support the view that 
girls have more knowledge of the specific type of material used in 
the test. Hence, any superiority boys may have over girls on de- 
duction items could not reasonably be attributed to a superiority 
in previous knowledge of Home Economics. 

The second requirement that boys do not obtain higher scores 
on principles pertinent to deduction items was also fulfilled. The 
data support this view. Therefore, any superiority of boys’ scores 
over girls’ scores on deduction items could not reasonably be 
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TaBLeE 2.—Do Boys TransFeR More THAN GrRis? 





Tests on Requirements for proof Actual results 





1) Information items | Boys should not obtain} Boys did not obtain 
(Home Economics higher scores higher scores P = 
material) .00 ,000 

2) Items on principles | Boys should not obtain} Boys did not obtain 
(Home Economics higher scores higher scores. Girls 
material) go slightly higher 

P = 30 

3) Deduction items | Boys should obtain | Boys did obtain higher 
(Home Economics higher scores scores. P = .00,000 
material) 











attributed to a superiority in knowledge of principles useful in 
figuring out the deduction items. 

The third and crucial requirement was that boys obtain signifi- 
cantly higher scores on deduction items drawn from Home Eco- 
nomics material when helpful suggestions are not given. This 
requirement was fulfilled at a level of significance of P = .00,000. 

All three requirements for proving that boys transfer more than 
girls were fulfilled. The findings, therefore, support the statement 
that boys transfer more than girls when helpful suggestions are not 
given. 


“When helpful suggestions are given, do girls transfer more in 
Science material?”’ may be investigated by the design in Table 3. 


TaBLE 3.—Given Hextrrut Suaecestions, Do Girris TRANSFER More 
THAN Boys? 





“‘ 


Tests on Requirements Results 





1) Information items | Girls should not gain | Girls did gain slightly 
(Science material) more than boys more than boys 
Pre-test followed by 
post-test 

2) Deduction items (Sci-| Girls should obtain | Girls did not obtain 
ence material) Pre- higher adjusted higher adjusted post- 
test followed by help-| post-test scores test scores 
ful suggestions and 
post-test 
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Do girls transfer more Science material when helpful sugges- 
tions are given? The answer is ‘No’. Neither of the requirements 
was fulfilled. The purpose of the first requirement was to determine 
whether or not girls gain more than boys by the ‘practice effect’. 
The practice effect was measured by the difference between the 
pre-test and the post-test scores on the information items. In this 
experiment, the girls were found to gain slightly more than the 
boys by the ‘practice effect’ for both Science material and Home 
Economics material. The girls’ gain for Science material was 1.35 
points while the boys’ gain was 0.63 points. For Home Economics 
material the girls’ gain was 1.65 points while the boys’ gain was 
1.23 points. The first requirement was not satisfactorily met. 

Neither was the second and crucial requirement satisfactorily 
met. After helpful suggestions were given and after allowances 
were made for the effect of previous knowledge of the deduction 
items, the boys were found to obtain significantly higher scores on 
deduction items on Science P = .0014. ‘Allowances’ were made by 
discounting the effect of pre-test score upon post-test score through 
the analysis of covariance. 


“When helpful suggestions are given, do boys transfer more in 
Home Economics material?’ may be investigated by the design in 
Table 4. 

Do boys transfer more in Home Economics material when help- 
ful suggestions are given? The answer is ‘Yes’. Both requirements 
were fulfilled. 


TABLE 4.—GIVEN HELpruL Suacestions, Do Boys TRANSFER More 
THAN GIRLs? 





Tests on Requirements Results 





1) Information items | Boys should not gain | Boys did not gain 
(Home Economics) more than girls more than girls 
Pre-test followed by 
post-test 

2) Deduction items | Boys should obtain | Boys did obtain higher 
(Home Economics) higher adjusted adjusted post-test 
Pre-test followed by post-test scores scores P = .00,000 
helpful suggestions 
and post-test 














wey 

















: 
; 
LS 
og 
% 
¥, 








Sex Differences in the Reasoning Process 457 


Girls were found to gain slightly more than boys by the ‘practice 
effect’. Evidence for this statement has been given in the discussion 
of the third question, and therefore the first requirement of the 
proof has been fulfilled. 

Evidence with a P = .00,00 support the view that boys obtain 
higher adjusted scores on the deduction items on Home Economics 
material when helpful suggestions are given. 

Hence, with both requirements fulfilled, it may be stated that 
boys transfer more than girls when helpful suggestions are given. 
By synthesizing the above conclusion that boys transfer more than 
girls when helpful suggestions are given with the previously stated 
conclusion that boys transfer more than girls when helpful sugges- 
tions are not given, one is led to the general conclusion that boys 
generally transfer more than girls. 

The conclusion that boys generally transfer more than girls is 
based on the fact that boys obtained significantly higher scores on 
deduction items in Home Economics material before suggestions 
were given, and also after helpful suggestions were given. 

The fact that boys obtained significantly higher scores on de- 
duction items in Home Economics could not be attributed to a 
superiority in intelligence, because the boys did not receive higher 
intelligence test scores. 

The higher scores of the boys could not be due to the fact that 
there was a superiority in ‘applicability’ and ‘initiative’, as the 
girls had received significantly higher ratings from their teachers 
on the personality trait labeled ‘applicability’ and ‘initiative’. 

The higher scores that boys received could not be explained on 
the basis of reading preference and reading activities because sur- 
veys have shown that girls prefer to read and actually do read 
more about Home Economics. than boys. 

The boys’ higher scores on deduction items could not be ex- 
plained on the basis of previous knowledge because the scores of 
the boys were lower than those of the girls on information drawn 
from Home Economics material. 

The difference in the scores could not be explained on the basis 
of knowledge of the principles pertinent to the deduction items. 
The girls’ mean score on the principles was slightly higher than the 
boys’ mean score. 

The boys’ greater gain in scores from pre-test to post-test on 














458 The Journal of Educational Psychology 


deduction items cannot be attributed to superiority in the practice 
effect. Girls were found to gain slightly more than boys by the 
practice effect. 

A superiority in the scores could not be attributed to a sex differ- 
ence in reading comprehension because the boys and girls used in 
the investigation were experimentally matched on the basis of 
reading comprehension. 

In summary, it may be said that since the boys’ superior mean 
score on the deduction items is not due to intelligence, to previous 
knowledge, to reading comprehension, to reading preferences or 
reading activities, to personality traits such as applicability and 
initiative, to practice effect, or knowledge of pertinent principles, 
it is reasonable to conclude that the higher scores on the deduction 
items are due to boys’ superior ability to transfer. 























PERCEPTION IN NUMBER SKILLS—A STUDY 
IN TACHISTOSCOPIC TRAINING’ 


JOHN L. PHILLIPS, JR. 


Department of Educational Psychology, University of Utah 


A perusal of the literature on tachistoscopic training discloses 
what appears to the writer as a single basic issue, with two major 
problems growing out of it. The basic issue is between two classes 
of theory, called in the psychology of reading, ‘peripheral’ and 
‘central’. (These terms can be extended to refer to any skill involv- 
ing the use of the higher mental processes). The two problems 
growing out it are (1) can peripheral training improve performance 
in these skills? and (2) can central training improve performance 
in these skills? 

The answer to the latter question is so obviously affirmative as 
to preclude any extended discussion. No one would question the 
necessity of concept formation as a prerequisite to reading, for ex- 
ample, or to recognition of aircraft, or to picking out the imperfect 
items on an assembly line. The other proposition is not so apodictic; 
it is the question that has provided stimulus for the present study. 
Can peripheral training improve performance in mental skills?— 
more specifically, can this type of training use short-exposure 
methods to advantage? 

It can be stated with considerable assurance that performance 
on perceptual tests of various kinds can be improved by practice 
in the reproduction of briefly-exposed stimuli. But does this im- 
provement in laboratory tasks transfer to skills involved in scho- 
lastic and other more utilitarian performances? 

Again the answer is seemingly affirmative; but when only those 
studies are considered that include trained control groups—i.e., 
groups trained by one or more alternative methods—it becomes 
apparent that relatively little evidence is available. 

Dallenbach, working with normal (2) and later with feeble- 


‘ Adapted from a thesis submitted to the Graduate School of the Univer- 
sity of Utah and to the Department of Educational Psychology in partial 
fulfillment of the requirements for the Ph. D. degree. The study was sup- 
ported by a research grant from the University Research Committee and 
was completed under the chairmanship of Professor R. DeVerl Willey. 
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minded (3) school children found a general increase in scholastic 
achievement as compared with a traditionally trained group; 
Renshaw (8, 10), and Melcer and Brown (6) confirmed him. Ren- 
shaw (9) also reported a rather startling superiority for tachisto- 
scopic training in reading at the college level. Brown (1) used the 
instrument to advantage in teaching spelling to seventh-graders. 
Winger’s (15) college students in beginning typing and the Navy’s 
(Neville, 7) aircraft recognition trainees performed better when 
trained by tachistoscopic methods. 

On the other hand, Weber (13), Sutherland (12), and Freeburne 
(4) set up training programs in reading for college students in such 
a way that the gains of the tachistoscopically-trained groups could 
be compared with those of groups trained by alternative methods, 
and although they found large gains being scored by their subjects, 
these gains were shared equally by all groups. Hoyt, et al. (4) 
tried the same thing in a college art course, and with the same 
result. 

Taken as a whole, then, the evidence indicates a considerable 
value for short-exposure methods; but the evidence is far from con- 
clusive, for there are important experiments on record in which the 
results are definitely negative. These methods have failed to estab- 
lish decisively their superiority over the more traditional tech- 
niques with which they must in the last analysis compete for 
preéminence as clinical and classroom procedures. 


AN EXPERIMENT IN ARITHMETIC TRAINING 


Computational skill, defined as the ability to respond consist- 
ently and correctly to the presentation of certain number combi- 
nations, was selected to be the focus of the investigation reported 
in this paper. An experiment was conducted in the fourth grade of 
a Salt Lake City, Utah, public school (first experiment) and then 
repeated in another school in another part of the city (second ex- 
periment). The primary purpose was the testing of two hypotheses: 
(1) that tachistoscopic training is effective in the teaching of com- 
putational skills, and (2) that tachistoscopic training is more effec- 
tive than a more conventional, paper-and-pencil approach. (Other 
problems were considered along the way, and will be discussed in 
the section on conclusions, but they were not the central objec- 


tives.) 
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Three experimental groups and a control group were equated in 
terms of intelligence (mean S1’, approximately one hundred), 
grade in school (fourth), sex, visual acuity, and number of training 
days present. Inasmuch as the classes had already been organized 
when the experiment began, it was necessary to do this equating 
of groups by casting out cases after the experiment had been com- 
pleted. N for each of the equated groups in the first experiment 
was ten (five girls and five boys); while in the second experiment 
there were five girls and nine boys, or fourteen in each group. In 
the first experiment Group IV, the control group (see below, Group 
IV), was eliminated as a result of a procedural error by the teacher 
in charge; in the second, all four groups were studied. In both ex- 
periments, the first half of the training was supervised by the 
regular classroom teacher (the same teacher for all groups), while 
in the latter half, the experimenter was in charge. Conditions of 
training were as follows: 

Group I: a paper-and-pencil procedure in which the practice 
sheets were like the pages in a workbook. 

Group II: an adaptation of the tachistoscopic method developed 
by Willey and Billington and published in the Keystone teaching 
manual. It consisted primarily of reproducing (on paper) complete 
number combinations (both sides of the equation sign) when these 
were flashed onto a screen. 

Group IIT: developed by the writer for these experiments; it also 
made use of the tachistoscope, but in a different way. Instead of 
flashing the entire combination onto the screen in a single exposure, 
the teacher flashed first the problem and then, after a short pause, 
the answer. In the 3-sec. interim between ‘problem flash’ and ‘an- 
swer flash’, the subject attempted to recall the correct answer and 
record it on paper. , 

Group IV: planned as a control in which no special training 
would be given on the set of combinations specified for training of 
the other groups. This control group merely continued with its 
regular classroom work, encountering the experimental combina- 
tions only as they happened to occur in textbook assignments and 
with no special attention being directed toward them. 

The amount of time spent in arithmetic classes per day was the 





? The SI (sigma index) is a standard score in a distribution of Salt Lake 
City children’s scores with a mean of 100 and a standard deviation of 20. 
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same for all groups. The amount of time spent per period in work- 
ing on the experimental combinations* was equal for all excepting 
the control group, where it was not measurable but was known to 
be very small. 

The training series comprised twenty multiplication and twenty 
division combinations selected primarily from the course of study 
for the next higher grade (fifth). These were typed onto trans- 
parent slides and projected in .11 sec. exposures‘ by a Keystone 
tachistoscope—an overhead projector with a diaphragm shutter 
attachment. 

Assessment of arithmetic gains made during training was done 
by means of a battery of arithmetic tests given before and after 
training, which had been constructed especially for the purpose 
and which presented the trained combinations in each of four ways: 
(1) horizontal, original order—exactly as presented in training; 
(2) vertical, original order; (3) horizontal, reversed order; and (4) 
vertical, reversed order.’ The effect of training on perceptual 
ability was tested by asking subjects also before and after training 
to reproduce on paper numbers of varying length (one through 
nine digits) when flashed at .03 sec. Slides were made in the same 
way as were those in the training series, and exposures were made 
by the same equipment. 

The data were analyzed into five categories: Perception ;* Hori- 
zontal, Original Order;’? Multiplication ;* Division;’ and Arithmetic, 
Total’; these were converted into difference scores (‘after training’ 
minus ‘before training’) and subjected to analysis of variance in all 
categories. 





* The length of practice periods varied, but they averaged approximately 
ten minutes for the two experiments. In the first experiment there were 
fourteen daily practice sessions; in the second there were twelve. 

4 Calibrated ‘1/25 sec.’’ on the face of the Flashmeter. 

7 9 

’ For example: (1)9X7=  ; (2) X9;(83)7X9=_ ; (4) X7; or (1) 
63 +9= ; (2) 9/63; (3) 63 +7 = ; (4) 7/63 

* As measured by the test described in the paragraph above. 

7 Including both multiplication and division. 

8 Including both horizontal and vertical, both original and reversed. 

* Including both horizontal and vertical, both original and reversed. 

10 Including multiplication and division, horizontal and vertical, and 
original and reversed. 
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DIVERGENT RESULTS FROM THE TWO EXPERIMENTS 


Results from the two experiments can be characterized as con- 
vergent (in basic agreement) or divergent (in disagreement). The 
latter are important insofar as they throw doubt on the former. 

Before the data were converted into difference scores, three of 
the pre-training measures were singled out for examination of their 
interrelations; these three (the total score from all the arithmetic 
tests, the perception test score, and the intelligence test score) 
were selected because of the theoretical significance that attaches 
to relations among them. 

On the basis of the theory advanced by proponents of tachisto- 
scopic training, there should be some correlation between per- 
formance in any intellective function and the perceptual skill 
which, according to the peripheral theory, underlies it. For exam- 
ple, arithmetic test scores and perception test scores should be 
positively correlated, as should intelligence test scores and arith- 
metic test scores. Data from the two experiments do not agree on 
this point: the first experiment showed no correlation, the second 
produced coefficients of r = .30 and r = .25, respectively, for the 
two correlations. Both are beyond the .05 level of confidence. 
Since the total N in the first experiment was only thirty, while that 
for the second was fifty-six, differences in significance levels may be 
due in part to the difference in N. 

When the data were converted into difference scores, a glaring 
discrepancy appeared between the two experiments: the first one 
yielded no significant F-ratios in the analysis of variance; while 
the second, with the single exception of the perception analysis, 
all of the F’s were significant. This, too, could be the result of a 
larger N in the second experiment, though another factor that 
undoubtedly influenced the size of the F-ratios was the presence 
of a control group in the second experiment. In every analysis, the 
largest differences were between this one group and all the others. 
With no control group in the analysis, the F-tests in the first ex- 
periment were handicapped not only by a smaller N, but by loss 
of the greatest contributor to ‘between-groups’ variance. 


CONVERGENT RESULTS FROM THE TWO EXPERIMENTS 


It might be fruitful to accept tentatively the above explanation 
of the lack of significance of ‘between-groups’ differences in gain 
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scores obtained in the first experiment, and to compare the direc- 
tion of differences obtained in the two experiments wherever these 
are shown to be significant in the second experiment. 

The second experiment produced a significant difference in favor 
of Group I over Group II and of Group I over Group III in every 
analysis but the one on perception; differences in the first experi- 
ment were also in this same direction. No significant differences , 
appeared between Group II and Group III in either experiment. 

In the second experiment Groups II and III were significantly 
superior to Group IV in every analysis but the one on perception. 
There was no control group in the first experiment, but there is 
every reason to believe that such a group would have gained even 
less during the training period than did the control group actually 
used in the second experiment. The former would have had no ex- 
perience whatever with the forty experimental combinations''; 
while the latter did encounter some of them in a haphazard manner 
during the time that the experimental groups were being trained. 

Thus, in every case where the second experiment produced a 
significant difference in a given direction, a difference in the same 
direction was found to have emerged also from the first experiment. 


DISCUSSION 


The statement was made as an introduction to this paper that 
its major objectives would be the testing of two hypotheses: (1) 
that tachistoscopic training methods are effective in the teaching 
of number skills, and (2) that they are more effective than a con- 
ventional workbook method. 

Actually, there was implicit in the design of the experiments a 
more precise statement of the second hypothesis; the development 
of an original tachistoscopic method for comparison with the work- 
book method, but also with the Willey-Billington method, implied 
the further hypothesis that this new tachistoscopic technique 
would prove superior to the established one. This hypothesis was 
based on a conviction that a period of striving for recall—the 
‘challenge period’, the writer calls it—was important in learning 
and was missing from the procedure described by Willey and 
Billington. Their method, on the other hand, was expected to prove 
superior to the workbook approach because it was designed to rid 





11 The course of study did not call for these particular items until later 
in the year. 
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subjects of their “uneconomical and disorganized habits of calcula- 
tion in favor of more efficient direct responses in the form of a 
continuous emergence of precise, economical, mature patterns of 
learning.”’ (14) Hypothesis 2, then, really involves a ranking of the 
three experimental methods of training with reference to their 
effectiveness in producing improvement in number skills. 

Also implicit in the design of the experiment is a set of hypoth- 
eses about the relationship of computational skills to perception: 
(1) subjects who are superior to their fellows in perceptual ability 
are superior also in computational skill, because the latter is to 
some extent dependent on the former; (2) improvement in percep- 
tual ability is accompanied by an increased effectiveness in com- 
putation; and (3) the tachistoscopic training used in these experi- 
ments is effective in developing perceptual ability of the type 
tested. 

The entire hypothetical structure, together with pertinent re- 
sults from both experiments, is summarized in the following 
outline: 


PERCEPTION 


1) ‘Subjects who are superior in perceptual ability are also 
superior in number skills.” The validity of this hypothesis can be 
neither confirmed nor rejected on the basis of correlation coeffi- 
cients obtained before training between perception test scores and 
arithmetic test scores: in the first experiment, the correlation was 
insignificant by inspection of the scatterplot, while the second pro- 
duced a correlation of .30 which is significant beyond the .05 level 
of confidence. The insignificant r from the first experiment may be 
due in some measure to the s:iiaiiness of the N. 

2) “Improvement of perceptual ability results in improvement 
of number skills.”” This hypothesis could not be tested, primarily 
because of the results reported under hypothesis 3, below. 

3) The tachistoscopic training used in these experiments is effec- 
tive in developing perceptual ability of the type tested.” This 
hypothesis must be rejected on the basis of the analysis of variance 
of perception test difference score: F-ratios in both experiments 
were insignificant, and since the lowest difference score was in each 
case actually a loss of proficiency, it can be safely concluded that 
no significant gain was made by any of the seven groups tested. 
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NUMBER SKILLS 


1) “Tachistoscopic training is effective in the teaching of com- 
putational skills.” This hypothesis can clearly be accepted. In the 
first experiment, which lacked a control group, all of the experi- 
mental groups made significant gains in arithmetic test scores of 
the four types analyzed; in the second experiment, there was in 
every analysis a significant difference between the gain made ‘by 
the control group and the smallest of the gains made by the ex- 
perimentals. 

2) ‘‘The three experimental methods can be ranked according to 
their relative effectiveness in the following order: ITI, II, I.” This 
hypothesis must be rejected. In the first experiment, the analysis 
of variance did not disclose significant differences between the 
groups; but the N was small, and all differences were in the same 
direction as the significant differences obtained with a larger N in 
the second experiment. Both experiments ranked the methods as 
follows: Group I (workbook) first, Groups II and III (tachisto- 
scopic) second; Group IV (control) was last in every analysis. 

Caution must attend the extension of these interpretations to 
situations other than the ones described here. Previous experi- 
ments with subjects at the primary-school level, for example, have 
indicated that the tachistoscope can be of value to teachers in a 
number of different areas, including arithmetic. 

It should be noted also that a good many interaction effects are 
possible which could not be examined in the present study. Controls 
were attempted of all the variables that seemed at the outset most 
likely to influence the results; but on the one hand it is impossible 
to control every factor that could be included, and on the other 
hand variance contributed by these factors might prove interest- 
ing in its own right. Had conditions permitted, a multivariate 
design would have been more rewarding. 

The results might be challenged from another point of view: it 
could be objected that the standardization that was necessary for 
a clear differentiation of methods had the effect of rendering them 
all'stilted and artificial. The criticism is well-taken; it is undoubt- 
edly true that each of the three experimental methods suffered 
to some extent from the limitations imposed by experimental con- 
trols. On the other hand, it is also true that the methods were all 
carefully planned to get the most out of each within these limita- 
tions, and there is no apparent reason why the enrichment that 
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would be possible in a non-experimental situation should affect 
one method any more than another. 


SUMMARY AND CONCLUSIONS 


Previous investigators of tachistoscopic training have not come 
to any agreement regarding its effectiveness with learners of vary- 
ing ages and backgrounds working with many different materials. 
Effects are apparently specific to particular constellations of these 
factors, and conclusions must be limited to the one constellation 
which characterizes a particular investigation. The present study 
is concerned with the effects of fourteen 8-min. training sessions on 
fourth-grade, public-school children working only with arithmetic 
combinations (multiplication and division). The writer concludes 
that in this situation tachistoscopic training is effective in the 
teaching of number skills but that it is not more effective than an 
ordinary workbook method of practice—in fact, the two tachisto- 
scopic methods used proved to be actually less effective than the 
workbook method. Tests of rapid perception showed no gain in 
any of the groups over the period of training. 
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Recent researches in the field of behavior have high-lighted the 
concepts of empathy and personal values. These areas appear to 
hold promise for students of behavior. The ability to empathize 
is important in social intercourse. It appears also to be a fruitful 
source of understanding individual differences in certain areas of 
personality. 

Personal values have shed light upon individual differences in 
motivation. Knowledge of personal values is making a contribu- 
tion in such fields as counseling where an understanding of condi- 
tions of living desired by the individual is important. In these areas, 
values form a triad with aptitude and interests. 

The purpose of this research has been to explore somewhat the 
relationship between empathy and personal values. It is felt that 
through continued research in these areas human behavior may 
become less and less an enigma. The postulates of George H. Mead 
(5) have been very fruitful in throwing light on the functioning 
and structuring of the mind and the self and their relation to 
society. The ‘generalized other’ of which Mead speaks has given 
many hints in the direction of a fuller understanding of the nature 
of social intercourse. Particularly when Cottrell’s réle theory (1) 
is added, one gains much insight in taking the réle of the ‘other’. 
The ‘other’ appears to be the key to unlock many dilemmas in 
human relationships. Dymond (2) has made use of the ‘empathic 
response’ in her research in the field of “taking the réle of the 
other”. She has completed the shift which has been developing in 
the use of the term empathy. In the use to which she has put it, 
there is no reference to the sensations of inanimate objects being 
internalized but rather reference is made to the internalizing of 
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the affect and cognition of the ‘other’. Dymond defines empathy as 
follows: “Empathy denotes the imaginative transposing of oneself 
into the thinking, feeling and acting of another and so structuring 
the world as he does.” 

Woodruff (6), using the value concept as a basis, has developed 
a theory of motivation which holds much promise. He defines 
personal values as generalized statements of conditions of living 
which the individual believes have an effect on his well-being— 
positive—negative—neutral. The measurement of values under his 
direction has added to the fields where psychometrics attempts to 
select and predict. Egbert (3) has refined the measurement of 
values to a point where objectivity is possible on the part of the 
psychometrist and his scale has sufficient statistical validity and 
reliability for survey purposes. 

Dymond describes her Rating Test as follows: ‘“‘The test was 
made up of four parts, each containing the same six items. In the 
first part the individual was asked to rate himself, on a five-point 
scale, on each of six characteristics. In the second part he was asked 
to rate the other individual as he believed this other would rate 
himself. In the third part the individual rates the other person as 
he thinks the other person would rate himself. In the fourth he 
must rate himself as he thinks the other would rate him. In other 
words, if two individuals A and B are being tested for their empathy 
with each other, the procedure would be as follows: 

‘“‘A. Part 1. A rates himself, (A). Part 2. A rates B as he (A) sees 
him. Part 3. A rates B as he thinks B would rate himself. Part 4. 
A rates himself (A) as he thinks B would rate him. 

“B. Part 1. B rates himself, (B). Part 2. B rates A as he (B) sees 
him. Part 3. B rates A as he thinks A would rate himself. Part 4. 
B rates himself (B) as he thinks A would rate him. 

‘Therefore a measure of A’s empathic ability can be derived by 
calculating how closely his predictions of B’s rating, (A 3 and A 4), 
correspond with B’s actual ratings (B1 and B2). Similarly a meas- 
ure of B’s empathy with A can be obtained by calculating how 
closely his predictions of A’s ratings, (B3 and B4), correspond to 
A’s actual ratings (Al and A2). 

“The six traits which were used as the items in all four parts of 
the test were: 1. self-confidence, 2. superior-inferior, 3. selfish- 
unselfish, 4. friendly-unfriendly, 5. leader-follower, 6. sense of 


humor. 
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“Although the usual objections to such trait-ratings were recog- 
nized, this procedure was followed none-the-less because the rat- 
ings were not being used to determine the personality of the sub- 
jects nor to determine how accurate the others were to their 
estimation of this. The test was designed to answer the question 
how well can the subject transpose himself into the thinking, feel- 
ing and acting of the others. If he can do this he should be able to 
predict how the others will behave in certain defined situations. 
The situation chosen to test this ability was the subject’s ability 
to predict how others will rate themselves and how they will rate 
him on these six traits. (A high score indicated low empathic 
ability and a low score the inverse.)”’ 

For the purpose of this study an adapted form of Dymond’s 
test was devised. This form may be used with an I.B.M. answer 
sheet. , 
Egbert’s “A Study of Choices Form VII” gives fifteen values for 
a personal value pattern. He lists and describes the values as 
follows: (3). 

1) Esthetic Appreciation—Appreciation of beauty and quality of 
surroundings. 

2) Comfort and Relaxation—Primarily physical comfort with 
some reference to ‘mental comfort’. 

3) Excitement—Experiences which are unusual and which give a 
thrill. 

4) Freedom—Lack of physical, mental or moral restraint from 
any external source. 

5) Friendship—Personal attachments to people outside the 
home. 

6) Family life—The husband, wife and children with the inter- 
relationships that exist among them. 

7) Intellectual Activity—Mental activity as an end in itself. 

8) Personal Improvement—Improvement of appearance and 
facility with the social graces. 

9) Power and Control—Control over associates and followers. 

10) Recognition—Prestige as such, apart, in-so-far as possible. 

11) Religion—Formalized religion with particular emphasis on 
belief in a Supreme Being and a life after death. 

12) Privaey—Being alone and having personal belongings which 
can be kept from other people. 

13) Security—Financial and physical security. 
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14) Society Life—Mingling with the social upper class and being 
active in clubs and night life. 

15) Social Service—Doing things for other people: The Golden 
Rule. 

The device is based on a ranking method. Egbert states there 
are thirty groups of five statements of values. The problem is to 
rank the statements in each group according to their importance 
for the happiness of the respondent. Each value appears ten times 
in the test and each value is in the same group with each other 
value at least twice and not more than four times. 

The test correlates well with Woodruff’s Study of Choices. In- 
formal interview has also indicated its validity. Reliability coeffi- 
cients based on the Froelich index of test reliability have a median 
coefficient of 0.690. 

Eighty college sophomores and juniors, students of elementary 
educational psychology, were selected for subjects. Egbert’s value 
test was administered to derive a value pattern for each subject. 
A questionnaire was filled out by each subject in which he ranked 
the other subjects from ‘know casually in class’ to ‘close friends’. 
Four subjects were chosen for a grouping for the administration of 
the empathy test. The subjects were selected to be in the same 
group only if their degree of friendship was more than ‘casually in 
class’ and less than ‘close friends’. 

Dymond’s empathy test was administered to the population and 
scored according to the deviation score. This gives each subject an 
‘empathic ability’ score and an ‘ability to be empathized with’ 
score. 

Table 1 is concerned with the twenty subjects from the popula- 
tion who were lowest and the twenty subjects who were highest 
for each value pattern concerned. Their empathic scores are also 
shown in this table. 

A difference significant only at the .10 level was found between 
the low social service and the high social service groups. This 
difference favors the subjects having the greatest empathic ability. 
The group nature and social intercourse supposition of social 
service would be in favor of those having the greatest empathic 
ability. 

The value tests were tabulated for deviation; that is, a score was 
derived for each individual depending upon his value deviations 
from the theoretical mean of 30 for each value. Either a positive or 
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TABLE 1.—EMPATHIC ABILITY FOR Groups Hi1GH AND Low ror Eacu VALUE 








Values Total | High | Low High and t | level of 
1) Esthetic apprec. 23.72 | 25.60 | 23.40 2.20 
2) Comfort 23.72 | 25.30 | 22.95 2.35 
3) Excitement 23.72 | 23.35 | 22.55 0.80 
4) Freedom 23.72 | 24.35 | 23.80 0.55 
5) Friendship 23.72 | 24.10 | 24.65 0.55 
6) Family life 23.72 | 21.10 | 24.50 2.90 


7) Intellectual act. 23.72 | 22.90 | 23.95 1.05 
8) Personal improv. 23.72 | 23.30 | 23.30 0.00 




















9) Power and control | 23.72 | 24.75 | 23.90 0.85 
10) Recognition 23.72 | 23.10 | 24.50 1.40 
11) Religion 23.72 | 23.75 | 23.75 0.00 
12) Privacy 23.72 | 23.70 | 24.20 0.50 
13) Security 23.72 | 23.55 | 25.20 1.65 
14) Society life 23.72 | 24.00 | 25.20 1.20 
15) Social service 23.72 | 21.45 | 24.55 3.10 1.89 .10 
16) Deviation score 23.72 | 23.70 | 25.85 2.15 
N-80 | N-20 | N-20 





negative deviation was added as it was felt the direction of devia- 
tion was not the significant factor. The means for the high twenty 
and the low twenty in empathic ability were tabulated and then 
tested for statistical significance. A ‘t’ score significant at the .05 
level was yielded. This indicates a statistically significant greater 
amount of deviation was found to exist for those subjects with 
greater empathic ability. 

It was felt that the deviation score may be a measure of maturity 
(4)—the more mature individual having a more defined pattern. 
He has differentiated his values, precluding many contradictions 
that might appear in the value pattern of a less mature person. He 
therefore may need less effort to determine his own status and more 
effort can be expended to clarify relationships. 

Table 2 is the companion of the preceding table. It indicates the 
mean ability to be empathized with for the high twenty and low 
twenty for each value named. Significant differences were found to 
favor family life at the .02 level for the higher group. The value of 
security was found significant at the .05 level. Again, the high 
groups had the higher mean. Social service value yielded a ‘t’ of 
2.86 which was significant at the .01 level. The high empathic 
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TaBLE 2.—“‘ABILITY TO BE EMPATHIZED WiTtnH’’ ror Groups HIGH AND 
Low For Eacu VALUE 


























Total High Low pitich ond t level 
‘0 ig _o eve 
1) Esthetic apprec. 23.83 | 25.70 | 22.55 3.15 
2) Comfort 23.83 | 25.95 | 23.60 2.35 
3) Excitement 23.83 | 25.20 | 22.70 2.50 ‘ 
4) Freedom 23.83 | 25.60 | 24.05 1.55 
5) Friendship 23.83 | 24.30 | 25.05 0.75 
6) Family life 23.83 | 22.20 | 27.40 5.20 2.77 .02 
7) Intellectual act. 23.83 | 23.70 | 25.90 2.20 
8) Personal improv. 23.83 | 24.05 | 23.35 0.70 
9) Power and control | 23.83 | 23.35 | 23.95 0.60 
10) Recognition 23.83 | 24.80 | 23.35 1.45 
11) Religion 23.83 | 22.40 | 26.30 3.90 1.99 10 
12) Privacy 23.83 | 25.40 | 23.10 2.30 
13) Security 23.83 | 21.35 | 25.60 4.25 2.51 .05 
14) Society life 23.83 | 26.50 | 23.30 3.20 
15) Social service 23.83 | 21.75 | 26.05 4.30 2.54 .02 
16) Deviation score 23.83 | 23.85 | 28.45 4.60 2.86 01 
N-80 | N-20 | N-20 





ability group had the highest mean. Religion was found to be sig- 
nificant at only the .10 level for the high group. These values are 
all important in group living. Those individuals who function well 
in social relations appear to also lead in values of a social nature. 

The table also indicates the significant differences in the devia- 
tion scores. The high group had a greater deviation score, signifi- 
cant at the .01 level, reflecting what may be thought of as greater 


maturity. 


TABLE 3.—VALUE PATTERNS FOR TEN SUBJECTS WITH WHOM OTHERS 
EMPATHIZE BEST 





Values 


Values 





1) Family Life 

2) Friendship 

3) Security 

4) Personal Improvement 
5) Social Service 

6) Freedom 

7) Religion 
8) Comfort 


41.40 
38 .30 
37.50 
34.60 
34.40 
33.00 
33.00 














9) Power and Control 
10) Recognition 

11) Intellectual Activity 
12) Society Life 

13) Excitement 

14) Privacy 

15) Esthetic Appreciation 
16) Deviation Score 





28. 40 
27 . 60 
26.80 


21.70 
21.70 
19.30 
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The composite value pattern for the ten subjects with whom 
others empathize best is shown in Table 3. It is noted that the val- 
ues that rank highest are, in general, those which are heavily 
weighted by social intercourse. 

The correlation between empathic ability and the ability to be 
empathized with was found to be +0.335. Research by Dymond! 
has indicated the finding of a correlation of +0.60 between em- 
pathic ability and ability to be empathized with. 


SUMMARY AND CONCLUSIONS 


Dymond’s Rating Test for measuring empathic ability and 
Egbert’s Study of Choices, Form VII were administered to eighty 
college students of educational psychology. They were grouped in 
diverse ways to reflect certain value pattern combinations. 

Results were tabulated statistically to determine degree and 
type of relationship existing between personal values and empathy. 

Results indicate that the subjects with high empathic ability— 
particularly the ‘ability to be empathized with’—tend to have the 
highest values in the areas where group interaction and social in- 
tercourse are major factors. Where the high values are less depend- 
ent upon group life and can be satisfied by individual action, 
empathy scores appear to be lower. 

Conclusions that certain other factors are operating in the rela- 
tionship appear to be warranted. They are: 

1) Empathy appears to be related to the amount of dispersion 
within a given pattern; i.e., if the value pattern is flat, or has few 
peaks, empathy scores are lower. 

2) Empathy seems to be positively related to such value factors 
as family life and social service. For the values comfort, power and 
control, and society life an inverse relationship is indicated. 

3) A correlation coefficient of 0.335 was found between empathic 
ability and the ‘ability to be empathized with’. 
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COLLEGE GRADES AND SELF-ESTIMATES 
OF INTELLIGENCE 


ORVILLE G. BRIM JR. 


University of Wisconsin 


This paper reports a study of the relationship between individual 
self-estimates of intellectual rank and college academic perform- 
ance as measured by individual grade-point average. The study 
continues some of the exploratory work on the influence of non- 
intellective factors upon high levels of academic and occupational 
achievement which began in another research program.! 

The major proposition of the study is that the grade-level at- 
tained by an individual varies directly with his own estimate of his 
intelligence. The relevant data are those showing the relationship 
between the subject’s actual grade-point average and self-estimate 
of intelligence. ! 

It is further proposed that this relationship results in part from 
a tendency of the grade-point average sought by an individual, as 
well as the strength of his motivation to achieve it, to vary directly 
with his own estimate of his intellectual rank. Thus two types of 
relevant data here are the relationship of responses to questions 
about grade aspiration and about motivation to self-estimates of 
intelligence. 


METHOD 


Questionnaires were administered to one hundred and three 
students in an advanced sacial psychology class (no freshmen, 
eleven sophomores) at the University of Wisconsin. 

Data on self-estimated intelligence were collected in the follow- 
ing way. Each respondent was asked to assume that he was among 
one hundred students taken at random from his complete class at 
the University, and that this group of one hundred was then given 
a series of intelligence tests and ranked from 1 (high) to 100 (low) 
on general intelligence on the basis of the test results. The respond- 


1 The writer was formerly a staff member of the project, ‘Cultural Fac- 
tors in Talent Development,” under the general direction of Fred L. Strodt- 
beck of Yale University. 
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ent was then asked to estimate what his position or rank would be 
among these one hundred students. 

Respondents were given the choice of signing their names or of 
leaving the questionnaire anonymous. Of the one hundred and 
three respondents, seventeen did not identify themselves. There 
were no differences between signers and non-signers on the charac- 
teristics of sex and of class in the University. The non-signers’ 
self-estimates of intelligence were higher than those of the signers, 
the mean of the estimates being the 31st rank in 100 for the former 
and the 39th rank for the latter. A ¢ test, however, (2, Ch. 12), 
shows the probability of this difference to exceed the .05 level, and 
therefore in subsequent analysis the two groups are considered 
together where appropriate. 

For the respondents who identified themselves it was possible to 
get both the cumulative grade-point average and a measure of 
their intelligence from University records.? The intelligence meas- 
ure was the American Council Psychological Examination (1948 
edition). The A.C.E. test norms used in computing the percentiles 
were those for the entering class of the individual in question. 


RESULTS 


1) Relationship of Actual Grade-point Average to Estimated Intel- 
lectual Rank.—In our analysis it is necessary to separate the in- 
fluence upon grade-point average of both actual and estimated 
intelligence. The correlation® between these two intelligence meas- 
ures was .43. The subjects in general overestimated their intellec- 
tual rank. The mean estimated rank for identified subjects, as 
noted above, was the 39th. Their mean rank on the A.C.E. was the 
52nd percentile, which when reversed in direction to be comparable 
to our data becomes the 48th. Hence the subjects on the average 
estimated their rank to be nine percentiles higher than the test 
results. 

The relationship of both actual and estimated intellectual rank 
to grade-point average is shown in simple form in Table 1. The cell 
values were generated by cutting the rank order of both actual in- 
telligence and estimates at the median, so as to most equalize the 





2 The writer is indebted to Dr. Joseph L. Lins, Director of Student Per- 
sonnel and Studies at Wisconsin, for his assistance in this task. 

3 Pearsonian correlation coefficients are used throughout. See McNemar 
(2, Ch. 6.) 
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TaBLE 1.—MraAN GRADE-POINT AVERAGE BY A.C.E. anp EsTIMATED 
INTELLECTUAL RANK* 





Estimated Rank 





Low High 





A.C.E. Rank A.C.E. Rank 
Low High Low High 











Mean Grade-Point Average 1.54 1.67 1.76 1.97 





* In terms of letter grades, 1 equals C, 2 equals B, 3 equals A. 


distribution, and then computing the mean of the grade-point 
average of the individuals in the cells. 

The differences between means show the influence of actual in- 
telligence, which is to be expected. But the differences also show 
that with actual intelligence roughly constant, those persons who 
believe they have higher intelligence get the higher grades. 

The correlations between the subjects’ grade-point average and 
both their actual and estimated intelligence are .31 and .32, re- 
spectively, which are significant at the .01 level. 

With the linear effects of actual intelligence held constant, the 
partial correlation (3, Ch. 9) between grade-point average and 
estimated intelligence is .20. The partial correlation between actual 
intelligence and grades with estimates held constant is also .20. A 
value of .22 would be necessary to reach the .05 level of significance. 

Our interpretation is that both variables, actual and estimated 
intellectual rank, are making a small but independent contribution 
to differences in grade-point average, with individuals presumably 
being influenced predominantly by one or the other. 

The multiple correlation of both measures of intelligence with 
grade-point average was .38, indicating that either of the simple 
correlations is almost as good a predictive measure as are both 
variables taken together. None of the correlations, however, is 
high enough to be of much value in prediction.‘ 

2) Relationship of Desired Grade-point Average to Estimated Intel- 
lectual Rank.—Our predictions were that a high estimate of intel- 
lectual rank would be associated with a high grade level sought by 





‘In addition to the correlation data, a simple Guttman coefficient of 
predictability was computed for actual and estimated rank combined. In 
this case C equalled .39. 
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the student, and a strong motivation to achieve this level. Such 
predictions would follow from the theoretical work of Horney (1), 
Murphy (4), Sherif and Cantril (5) and others. 

An individual’s self-image is composed of segments, e.g., one 
thinks of himself as big, strong, and handsome. Where segments of 
the self-image are valued by the individual, ego-motives arise. 
These motives are directed to goals which, if achieved, the indi- 
vidual can interpret as evidence that he is in fact what he considers 
himself to be. 

The segment of self-image involved in this study is one’s con- 
ception of himself as ‘bright’, ‘average’, ‘dumb’, and the like. Most 
normal individuals in our culture positively value such conceptions 
as brightness; i.e. conceptions of being highly intelligent. Indi- 
viduals who have such self-conceptions of high intelligence are 
motivated to protect this conception, to act in such a way that 
corroborating evidence results. 

Among American college students, the achievement of high 
grades is customarily held to be evidence of high intelligence. 
Individuals among this group who conceive themselves as ranking 
high in intelligence would therefore normally seek high grades as 
evidence for this conception. 

However, if we hold that grades are taken as evidence of one’s 
intellectual abilities, then we must recognize at this point the 
criticism that the correlations reported above may simply reflect 
the fact that the self-estimates are based on grades, and do not 
stand in any causal relationship to them at all. Our answer is that 
the relationship between college grades and self-estimates of in- 
telligence is probably a fluid one in which a gradual accommodation 
is reached between the self-conception and the realities represented 
by grades. The remainder of the paper presents data which indi- 
cate that self-estimates do play a causal réle in determining grades, 
and that therefore the correlations reported above represent in 
part the adjustment of grades to self-conceptions, and not solely 
the reverse.® 





5 An experimental design giving an adequate answer to this question 
would be one in which randomly selected groups of entering college students 
were given information on their intellectual rank, with one group being 
placed above their actual test scores, another below, and so on. The groups 
could then be followed through college, and their achieved grades compared 
with the grades predicted on the basis of customary psychometric tech- 
niques. 
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TABLE 2.—CHANGE IN GRADE ASPIRATION WITH ASSUMED RISE IN 
INTELLECTUAL RANK 





Response Number of Subjects 





Try for better grades 
Try for less 

Not change at all 
No answer 


eoSod 








The tendency of aspiration level concerning grades to vary with 
one’s belief about his intellectual rank was explored by means of 
hypothetical situations. Each subject was asked to assume that he 
had found out that he was much more intelligent than he thought, 
that his true rank was much higher, and then was asked to indicate 
whether he would ‘try for better grades’ than he was trying for 
now, try for less than now, or ‘not change feelings at all about 
trying for grades’. The distribution of responses support the pre- 
diction. These are shown in Table 2. 

The same information was collected again, but with each subject 
asked to assume that he had discovered that actually his intellec- 
tual rank was much lower than he thought. The distribution of 
responses is given in Table 3. 

The writer can offer no systematic explanation for the responses 
of the eighteen subjects who would try for better grades as a result 
of finding they were much less intelligent than they thought. 

In regard to the ‘non-changers’ in both situations, it seemed 
probable that these were subjects who fall toward extremes in 
self-estimates. Those who already have high self-estimates would 
be unlikely to raise their aspirations on finding they are even more 
intelligent. Conversely, those subjects with low self-estimates 
would be unlikely to lower their desires after finding out they are 
even lower than they thought. 


TABLE 3.—CHANGE IN GRADE ASPIRATION WITH ASSUMED FALL IN 
INTELLECTUAL RANK 








Response Number of Subjects 
Try for better grades 18 
Try for less 50 
Not change at all 27 
No answer 8 
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This possible explanation of lack of change in aspiration was 
tested by splitting the subjects into high, middle, and low groups 
in regard to their own estimates of intelligence, cutting at points 
which most equalized the numbers in the three groups. There was, 
however, no significant difference between the three groups in the 
number of non-changers, thus indicating that the explanation lies 
elsewhere. 

3) Relationship of Strength of Motivation to Estimated Intellectual 
Rank.—As a measure of strength of motivation each subject was 
asked to indicate what his feelings were when he failed to achieve 
the grade he had set as a goal in any particular instance; i.e., if he 
failed to do as well as he was trying to do in regard to grades. The 
indication of concern was made along a nine-point scale ranging 
from ‘indifferent’ (0), to ‘extremely upset’ (9). 

Responses to this item were placed into high and low categories 
of concern by cutting at the point (between 6 and 7) which made 
most equal the numbers in each group. The number of responses 
in each category by subjects in the high, middle, and low groups in 
regard to self-estimated intellectual rank is given in Table 4. A chi 
square test indicates the difference in distribution of responses to 
be significant, with P <.05. 

The results support our contention that subjects with high self- 
estimates would show greater concern. However, they show in 
addition that many of the subjects with lowest estimates are also 
highly concerned. One explanation of this may be that subjects in 
this group are concerned for quite practical reasons. The scholastic 
average of this group is low. They are concerned not over threats 
to the ego-picture, as are the high subjects, but rather over the 


TABLE 4.—DEGREE OF CONCERN OVER FAILURE BY ESTIMATED 
INTELLECTUAL RANK* 














Concern 
Rank 
Low High 
High 16 15 
Middle 27 7 
Low 17 14 








* There were seven ‘no answers’ on the question about amount of con- 
cern. 
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possibility of being put on probation or dropped from school for 
failure to keep up their averages. 

The relationship between motivation and self-estimates was ex- 
plored in an additional way, in a hypothetical situation. Each sub- 
ject was asked to indicate how he would feel if he failed to get the 
grades he would try for after finding out that he was much more 
intelligent than he thought. The same nine-point scale for indicat- 
ing responses was used. 

The mean concern indicated in response to this hypothetical 
situation was six on the scale. This was \identical with the mean 
concern expressed by the subjects over their actual failures. This 
finding was contrary to the prediction that the mean concern would 
be higher in the hypothetical case. 

Identical information was collected in regard to a hypothetical 
discovery of much lower intelligence. The mean concern indicated 
was five on the scale. A ¢ test indicates this to be significantly lower 
than the concern expressed over actual, failures, with P <.001. 
This difference is in accord with our prediction. 

There was no relationship between original self-estimates of 
rank, and changes in degree of concern following the hypothetical 
changes in rank. This is in accord with the finding on lack of 
changes in grade aspiration under the same conditions. 


SUMMARY 


A questionnaire study was carried out to determine the relation- 
ship between self-estimates of intelligence and college grades. A 
small but significant correlation of .32 was found. When actual in- 
telligence (A.C.E. percentile) is controlled, the correlation drops 
to .20, which is slightly below the .05 level of significance. 

When the subjects were asked to assume that they discovered 
their intellectual rank was much higher and much lower than they 
had estimated, their reported aspiration level for grades showed a 
corresponding increase and decrease. 

Significantly more of the subjects in both high and low thirds of 
the self-estimated intellectual ranking reported a high degree of 
concern over failure to achieve the grades they desired. 

When subjects were asked to estimate their concern over failure 
after assuming their intellectual rank to be much higher and much 
lower than they had estimated, there was no difference from exist- 
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ing concern over failures in the first case, but in the second it 
dropped significantly lower. 
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SPELLING ABILITY AND VOCABULARY 
LEVEL OF ONE HUNDRED COLLEGE 
FRESHMEN 


FRANK LAYCOCK 


Assistant Professor of Education, University of California, Riverside, Calif. 


High-school and college English teachers chronically deplore the 
poor spelling they find in student essays and occasionally suggest 
that it hampers effective expression. If an idea seems to demand a 
certain word, the argument goes, but the word is hard to spell, a 
prudent writer will search for a substitute he knows he can spell. 
If he chooses a less suitable substitute, clarity loses out to prudence, 
and the solution must be to improve spelling. 

This reasoning permits three hypotheses: (1) the complexity of 
prose vocabulary correlates positively with spelling ability (so that 
poor spellers restrict their written vocabulary to words they can 
spell) ; (2) vocabulary complexity shows no correlation with spelling 
ability; and (3) vocabulary complexity correlates negatively with 
spelling ability. This investigation tested these hypotheses on a 
group of college students. 


PROCEDURE 


Spelling.—In a preliminary trial Hudelson’s (3) ‘Seven 8’ list 
yielded thirty-five words suitable for a college-level spelling test. 
Fifteen additional, harder words came from an unpublished scale 
by Luther C. Gilbert, making a total of fifty words. A large class 
containing nearly one hundred and fifty freshmen took the spelling 
test. Introductory directions explained the situation and invited 
codperation; mimeographed sheets provided space for writing 
personal data and the fifty words; an unhurried, friendly atmos- 
phere dissipated tension. Each of the words in the test was pro- 
nounced clearly, read in a sentence, and pronounced again. The 
whole process for a word was repeated on request; to finish, the 
words alone were repeated in order. 

Students.—In the class tested, one hundred and forty freshmen 
turned in usable tests. Further selection eliminated: immigrants 
handicapped in English; students claiming freshman status whose 
records showed a year or more of college work; students whose 
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English placement tests (containing the essays to be used for meas- 
uring vocabulary) were not available; and students who had al- 
ready studied English composition at college. There were one 
hundred and two students left; two more were arbitrarily excluded 
to provide an even one hundred. These one hundred, from a course 
which annually attracted a large student group and had no re- 
strictions on admission, were representative of freshmen beginning 
their second semester of college. . 

Vocabulary.—To measure written vocabulary, the essays were 
used which are part of the matriculation examination in English. 
Every entrant must take the test and is likely to try his best, be- 
cause it is widely known that failure means separate non-credit 
instruction and a certain popular stigma. Instructions clearly state 
that mechanical errors like spelling are penalized and that wise 
choice of words determines much of the total score. The suggested 
length of five hundred words is a generous sample on which to base 
an average. 

A separate tally sheet was prepared for each student, with space 
for pertinent data taken from the essay folder. Horn’s (2) 10,000- 
word list was used to measure complexity in preference to that of 
Thorndike (4) or Ayres (1) because it was based on school work 
and letters as well as published reading material. Horn considered 
derivatives separately, and had a system of weighting and credits 
which make the final tabulation as accurate as possible. For each 
word in a student’s essay which was among Horn’s 10,000 most 
common words, a score was tallied to indicate the weighted fre- 
quency assigned to it by Horn. This weighted score represented 
the frequency with which the word was met in tallying over 
5,000,000 running words. A separate space was reserved for words 
not in Horn’s list and for proper names (arbitrarily ignored by 
Horn). Words not among Horn’s 10,000 are presumably either 
more difficult or more specialized—and in either case would there- 
fore not be words most commonly used in discussion. However, the 
topics listed as acceptable for essays (for instance, “Juvenile delin- 
quency as a post-war problem’’) called for special and perhaps 
difficult terminology. These uncommon words were tallied sepa- 
rately even though no weighted score was available, because leav- 
ing them out would eliminate any indication of either a rich 
vocabulary or the restrictions spelling might impose. 
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RESULTS 


Spelling.—For each test, the number of words correctly spelled 
was converted in the usual way to standard scores having a mean 
of 50 and a o of 10. The raw scores ranged from 12 to 48 around 
a mean of 30.7, with o = 6.8. Thus the range of standard scores 
was from 21 to 75. The raw scores were tested for skewness, and 
found to be sufficiently normal in distribution to warrant conver- 
sion to standard scores. 

Vocabulary.—Analyzing vocabulary was complicated by the 
very wide range of credit numbers (from 11 to over 700,000). 
Walker’s (5) suggestion helped: ‘“‘When the frequency piles up at 
one end of a distribution, use of a unit which is fine enough to bring 
out the form of the distribution in that part of the range where the 
number of cases is large may produce a very long table if the same 
interval is used throughout. In such cases, the width of interval is 
sometimes changed from one part of the range to another. Compu- 
tations may be carried out with the gross scores which correspond 
to the midpoints of the various intervals.” 

Accordingly, intervals were arranged, and for each word the 
midpoint of the interval within which its credit number fell was 
tallied. Each essay thus reduced to a tally of interval midpoints, 
and the mean of these midpoints represented the vocabulary level 
of the essay. For the one hundred essays, scores ranged from 
208,706 to 303,891 around a mean of 254,216, with o = 9,895. 
Thus, in an essay given a score of 208,706, the average word was 
equivalent to one which Horn tallied 208,706 times in examining 
over 5,000,000 running words. These raw vocabulary scores were 
converted to standard scores for easy comparison with spelling 
standard scores. However, a reversal was necessary. Large spelling 
scores represent superior ability; large vocabulary scores indicate 
use of common words—that is, in this context, a poorer vocabulary. 
Vocabulary standard scores were therefore reversed by subtracting 
them from 100 (theoretical maximum) before being entered, so 
that a large standard score would represent superior vocabulary. 

Correlation of spelling and vocabulary scores.—The aim of the in- 
vestigation was to see how spelling and vocabulary relate to each 
other. Correlating the two sets of scores gave a Pearson r = .075 
+ .067, indicating a very nearly chance relationship. This absence 
of correlation supports the second hypothesis, and it therefore 








488 The Journal of Educational Psychology 


TABLE 1.—Spe.uine Test: Best, MippLE, AND Worst STANDARD ScoREs 
WITH CORRESPONDING VOCABULARY STANDARD SCORES 











Worst Scores Middle Scores Best Scores 
Stud. Splng. Vocab. Stud. Splng. Vocab. Stud. Splng. | Vocab. 

1 21 43 46 50 54* 91 63 60* 
2 25 36 47 51 49* 92 63 51 
3 28 65 48 51 52* 93 65 62* 
4 28 55 49 53 42 94 66 38 
5 28 42 50 53 62 95 66 35 
6 28 53 51 53 59 96 66 ~| 45 
7 30 57 52 54 47 97 66 43 
8 31 50 53 54 44 98 68 64* 
9 31 37 54 54 58* 99 68 66* 

10 31 34* 55 54 49* 100 75 58 





























* Vocabulary scores within five standard score points (= 0.5 a) of cor- 
responding spelling scores. 


cannot be concluded that poor spelling ability noticeably constricts 
written vocabulary. 

Further analyses.—In case the over-all correlation concealed re- 
lationships that might be revealed by extreme scores, two com- 
parisons were made: (1) the best ten, worst ten, and middle ten 
spelling scores were paired with their corresponding vocabulary 
scores; and (2) the reverse comparison placed the best, worst, and 
middle vocabulary scores alongside corresponding spelling scores. 
Tables 1 and 2 show this comparison. The small number of entries 
in each group did not warrant elaborate statistical procedures. 
‘Close relationship’ was arbitrarily assigned to scores within five 
standard score points of each other, i.e., within 0.5 o. Using this 
criterion, there was little discernible relationship: isolating the 
best, worst, and middle scores did not reveal many closely related 
pairs. 

Another possibility suggested itself. Since the directions offered 
many topics on which to write essays, perhaps a comparison among 
students discussing the same topic would show a pattern. Students 
choosing a common theme may discuss it at different vocabulary 
levels. Vocabulary level here could be measured by considering the 
number of words not in Horn’s list, and the quality of vocabulary 
thus indicated compared with spelling ability. Table 3 groups four 
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TABLE 2.—VocaBULARY TEsT: Best, MIDDLE, AND Worst STANDARD 
ScoRES WITH CORRESPONDING SPELLING STANDARD SCORES 























Worst Scores Middle Scores Best Scores 
Stud. Vocab. Splng. Stud. Vocab. Splng. Stud. Vocab. | Splng. 

40 23 48 23 49 42 93 62 65* 

90 28 63 66 49 57 76 63 58* 

: 22 30 42 34 49 46* 98 64 68* 
‘ 67 30 57 55 49 54* 33 65 45 
i 84 33 60 47 49 51* 3 65 28 
10 34 31* 82 49 60 59 65 55 

95 35 66 8 51 31 99 66 68* 
43 35 50 20 51 42 16 69 40 
2 35 25 61 51 55 75 72 58 
58 36 55 69 51 58 73 74 58 





























* Spelling scores within five standard score points (= 0.5 ¢) of corre- 
sponding vocabulary scores. 


TABLE 3.—SPELLING STANDARD SCORES AND NUMBER OF WorDsS IN ESSAYS 
Not IN Horn’s List, oF SELECTED STUDENTS GROUPED ACCORDING 
To Four Essay Topics 











Topic A 
~ Splng. | No. 
Stud. | ‘Qoore_ {Words 
6 28 25 
13 37 7 
18 40 6 
26 43 21 
27 43 6 
34 46 14 
39 48 20 
42 50 14 
51 53 21 
69 58 23 
74 58 25 
78 60 5 
86 62 18 
63 


























Topic B Topic C Topic D 
Stud. | SPO |woris | Stud. | SBE | words | Stud. | LOE | woe, 

5 28 10 7 30 5 9 31 1 
15 39 12 14 39 28 17 40 22 
28 43 1] 35 46 24 38 46 9 
45 50 15 | 41 50 11 43 50 2 
63 55 14 | 57 55 12 61 55 17 
65 57 9 | 80 60 40 62 65 13 
71 58 40 | 83 60 15 70 58 8 
77 58 20 | 87 62 6 85 62 7 
79 60 0 | 90 63 6 
81 60 10 | 96 66 27 
84 60 2) 98 68 22 
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topics which together accounted for half the essays. Within each 
group, students are arranged according to spelling score, with the 
corresponding vocabulary score entered to show the number of 
words which are not in Horn’s list. Here again no close relationship 
appears. 

These results apply to the restricted situation investigated. 
Other populations, however, and other ways of measuring spelling 
and vocabulary may show different patterns. Intelligence or read- 
ing or motivation may influence spelling and vocabulary in ways 
separate from any mutual relationship between these two factors. 
Special fields at the college level, like mathematics, English, or 
foreign languages, may have particular characteristics. 


CONCLUSIONS 


1) College freshmen vary in spelling ability and written vocabu- 
lary. Spelling variation in the group tested was similar to that in 
the general population, although about a higher mean. Vocabulary 
varied, too, but there are no general population data for com- 
parison. 

2) Of the three hypotheses set up for testing, only the second is 
justified: vocabulary complexity shows no correlation with spelling 
ability. There is no evidence in this study that freshmen inhibit 
their written vocabulary because of fear of misspelling. 

3) Since there is no relationship between spelling and vocabulary, 
spelling improvement by itself should not be expected to lead to 
better or freer written vocabulary. 


SUMMARY 


In this study a spelling test screened out one hundred university 
freshmen who had not studied English composition or spelling be- 
yond high school, and whose knowledge of English was not affected 
by foreign extraction or unusual experience. Their spelling scores 
were distributed normally from high to low. Each of these fresh- 
men had written as part of the entrance examination in English a 
500-word essay, and had been warned that choice of words and 
clarity of expression were important and that mechanical errors 
like misspellings would be penalized. Scores were assigned to each 
essay to indicate the average frequency of appearance of the words 
used, based on Horn’s (2) list of 10,000 English words most com- 
monly used in writing. Each student’s spelling and vocabulary 
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scores were compared. Both Pearson r (= .075) and inspection of 
selected cases found only a chance relationship between spelling 
ability and written vocabulary level. The methods used with this 
group of one hundred freshmen show that low spelling ability does ££ 
not noticeably hamper vocabulary choice when there is pressure to j 
write as well as possible. Examining spelling and vocabulary in | 
other settings and with different instruments should help to test 
this conclusion. 
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RORSCHACH CONFIGURATIONS ASSOCIATED 
WITH COLLEGE ACHIEVEMENT: 


CHARLES C. McARTHUR AND STANLEY KING 


Department of Hygiene, Harvard University 


Vorhaus (5) recently defined four test patterns that she found 
accounting for nearly three-quarters of the Rorschachs she had 
collected from children attending a remedial reading center. She 
was able to explain the children’s ‘reading problems’ as symptoms 
of long-standing character difficulties revealed by the four test 
syndromes. The four test patterns that Vorhaus described may be 
characterized as follows: 

Type I is the merely formal record, with no other determinant 
save the form of the blot being used more than twice. 

Type II is the animal-movement dominated record, with nothing 
else save form being much used. 

Type III is the human-movement dominated record, with no 
accompanying development of other determinants. 

Type IV is dominated by inanimate movement and color re- 
sponses, with no other determinants being much utilized. 

We will interpret the psychological meanings of these patterns as 
they arise in our discussion. Detailed interpretations of them are 
suggested in Vorhaus’ article. 

Although in age her subjects ranged from six to twenty-one, and 
although academic difficulties are not always a result of ‘reading 
problems’, we thought it worth while to test her patterns as prog- 
nosticators of academic difficulties in college. Past attempts to re- 
late the Rorschach to college adjustment have met more failure 
than success. In the most hopeful study to date, Munroe (4), using 
group administration of the test, showed a relation between Ror- 
schach patterns and student success at Sarah Lawrence, but an 
attempt to apply her method at the University of Chicago (1) 


brought negative results. 





1 This experiment was supported in part by the Department of Hygiene, 
Harvard University, the Harvard Medical School Department of Psychi- 
atry and the Department of Social Relations; and in part by Air Force Con- 
tract A.F. 33 (038) 201-42, from the School of Aviation Medicine, Randolph 


Field, Texas. 
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Rorschach Configurations 


PROCEDURE 


The writers set out to compare the frequency of Vorhaus types 
among individual Rorschach protocols collected from two groups: 
(1) one hundred and thirty-seven referrals during the years 1950- 
1952 to Harvard’s Department of Hygiene, all of these students 
being in some sort of academic or personal difficulty, and all tested 
by McArthur; and (2) seventy-four controls, who were being tested 
as part of another experiment at Harvard Medical School. The 
latter were all Harvard College juniors or seniors in the classes of 
52 and ’53 and were selected at random from the class lists. They 
were classmates of, or at least undergraduates contemporaneous 
with, the Hygiene referrals. They were tested by King. 

Our expectations were first that Vorhaus types I and II would 
be rare in our two samples of college students, inasmuch as they 
seem to lack the resources to carry them so far in their education. 
Secondly, we expected that type IV would be common, and would 
occur more frequently in the Hygiene cases than in the controls. 
Vorhaus’ description of this type sounds like a common variety of 
college problem-student, known already to the physicians at 
Harvard. As Vorhaus says: 

“The implication ...is that the subject is responsive to affec- 
tive stimulation; indeed, he is responsive to a point where moments 
of strong feeling occur as often, or almost as often, as do those 
when a more surface pleasantness is all that is evoked.... The 
subject may have succeeded in repressing recognition that this is 
so. 
“We again glimpse ... the ‘good’ home, the submissive child, 
and the awareness of pressure.... Since the resentment (of the 
pressure) cannot be overcome, the psychological need becomes 
that of preventing it from being experienced as associated with the 
environment. This is done by turning the hostility against the self, 
for the ‘guilt’ of harboring the resentment .... With this accom- 
plished, the subject is able to feel that ““Mother and Father are 
entirely just in all their demands and expectations. It is I who am 
guilty for not codperating with them. It is because of my inade- 
quacy and inferiority that their ‘good’ plans have not worked out.” 

This unhappy state of mind is seen all too commonly in the men 
who come, or are sent, to the Department of Hygiene for help. 
They are usually in academic trouble as a matter of administrative 
fact or else are ‘failing’ in their own eyes because they have not 
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met their own and their parents’ level of aspiration. In this respect 
these Hygiene cases are probably quite similar to the reading clinic 
cases seen by Vorhaus. 

Her description of type IV did not seem to fit what was known 
of the men in the control group. These men had been extensively 
studied in an experiment on laboratory stress-inducing situations 
(2) and on the whole had been judged to be stable, well-adjusted 
students. Therefore, we had grounds for expecting that type IV 
would be less prevalent among this randomly selected group than 
among the Hygiene cases. 

Before making direct comparisons, it was necessary to be certain 
that our results were not contaminated by differences in scoring. 
To test inter-scorer reliability, McArthur rescored twenty of 
King’s protocols, selecting those in which slight changes in scoring 
would be most likely to result in changes of type. The result was 
that only one test needed to be reclassified. The question of the 
effect of the personality of the tester on Rorschach responses could 
be checked in only one case when both of us tested one subject; 
in that instance, the configuration was not altered. 


RESULTS 


When we compared the Hygiene cases with their controls, the 
results were as predicted. Vorhaus types I, II, and III were com- 
paratively rare, and in no case numerous enough to differentiate 
significantly between controls and Hygiene cases. Type IV proto- 
cols were more frequent, differentiating the Hygiene group from 
the control group at the .01 confidence level. Furthermore, Table 
1 shows that this contrast was reliable over two years of Hygiene 
experience. 

Two circumstances in the data (which are presented in Table 1) 
raised questions that needed exploration. Only half of the Hygiene 
cases were included in the four Vorhaus types and only one-quarter 
of the control cases. While that difference was in the predicted 
direction, and was itself a satisfactory finding, it leaves fifty to 
seventy-five per cent of the Rorschachs unexplained. Then, too, 
the contrast between Hygiene and controls with respect to type 
III records was, though insignificant, in the wrong direction. What 
is happening here? 

An examination of the Vorhaus check list defining type III 
shows that its characteristics are: a) M (human movement) has 4 
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TABLE 1.—RELATIVE FREQUENCIES OF RorscHacH Types IN HYGIENE 
REFERRALS AND CONTROLS 























Type Hygiene ’50-’51 | Hygiene ’51-’52 Controls Significance 
Vorhaus I 7 per cent 2 per cent 7 per cent none 
Vorhaus II 2 per cent 0 per cent 3 per cent none 
Vorhaus III 5 per cent 6 per cent | 14 per cent none 
Vorhaus IV | 35 percent | 39 per cent 2 per cent 01 

N = 55 82 74 





numerical value of 3 or more. b) FM (animal movement) cannot 
exceed one and one-half times M. c) m (inanimate movement) does 
not exceed one-half times M. d) the color sum (weighted total of 
responses to color) is less than 3. 

On casual inspection, one would not take this as a syndrome that 
would bode ill for academic performance or for reading skills. Tra- 
ditionally, the M response has been taken to be the intellectually 
‘best’ kind of response, and type III seems to be an M-dominated 
record. 

The rationale, of course, is that the type III person has great 
assets, as shown by his movement responses, but that these assets 
are not put to socialized use. Without adequate development of the 
color responses (as quantified by condition d), one suspects that 
the type III man has a rich but too exclusively inner mental life. 
Vorhaus (5) feels that “the instinctual drives have been integrated 
into the individual’s total functioning, thus becoming the source 
of creative energy and achievement,” but that there is “failure, 
both in the capacity to establish rapport and in the ability to react 
to the environment deeply and genuinely.” The result, she sug- 
gests, is a “determination to preserve his creative drives as his 
personal (and secret) treasure.” 

There are those who would say that such a person would be a 
‘natural’ student for Harvard. This is the born resident of the 
ivory tower, who should fit into a university that has been charac- 
terized by F. L. Wells (7) as ‘dominantly cerebrotonic’. Yet even 
‘cerebrotonic’ schools demand that students pass tests. Even in 
such institutions, some ‘capacity to establish rapport’ is a pre- 
requisite for getting grades. It should therefore be the type III 
people, with the addition of adequate color response, who make 
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the best Harvard students. If the color response may be taken as 
indicating the ability to relate inner resources to outer environ- 
ment, such modified type III subjects should presumably display 
resources that are available for socialized use and are not, like the 
resources of pure type III people, retained as ‘private treasures’. 

We therefore may define a new type, IIIA, as people who meet 
the first three criteria of Vorhaus type III but are not boupd by 
type III’s fourth limitation. Type IIIA people show: a) M with a 
value of 3 or more. b) FM less than one and one-half times M. c) 
m less than one-half times M. d) color sum of 3 or more. 

Empirically, they turn out very well. They are found in our 
groups at the following rates: 





Hygiene ’50-’51 | Hygiene ’51-’52 Controls Significance 





Type IIIA 7 per cent | 7 per cent 33 per cent 01 





The difference is in the predicted direction and reaches a good level 
of confidence. Once again, the Hygiene figures look reliable. 

An even stricter definition of the ideal student’s Rorschach 
might demand that his color responses be developed in the direc- 
tion of form-dominated color responses rather than of the use of 
color alone or color dominating form as a determinant. That, on 
orthodox Rorschach theory (3), would indicate not only ability to 
relate oneself to the environment, but specific skill at tying one’s 
impulses in with the social environment. It is usually FC that is 
taken as the sign of ‘ability to establish rapport’. 

So we made a stricter definition of our ‘good student’ subtype, 
demanding, beside the criteria for type IIIA, that FC (form-color 
responses) should numerically exceed CF (color-form) and C (pure 
color). For this narrower group (which, of course, is selected within 
the ranks of type IIIA), we find these frequencies: 





Hygiene ’50-’51 Hygiene ’51-’52 Controls Significance 





Type IIIB 0 per cent 1 per cent 22 per cent .O1 





Though we lose cases, the exceptions to our rule are almost elimi- 
nated as we sharpen our definition. In fact, the lone Hygiene 
record showing this type of M and FC dominated scores was made 
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by a boy who had been referred for purely vocational guidance 
reasons and not for the usual academic, personal and administrative 
troubles. 

We must hasten to point out that well-developed M and FC re- 
sponses are usually sufficient but never necessary conditions for 
academic success. Wells (6) has shown, in a recent study at Har- 
vard, that twenty per cent of the National Scholars here actually 
had no human movement responses! 

From these facts, we may reach certain firm conclusions and 
suggest other likely hypotheses. 

First, we have offered additional evidence for the validity of 
Vorhaus’ typology. Those types for whom she suggests minima] re- 
sources are not much seen at Harvard, while her type IV, the only 
one commonly admitted to this college, is found almost exclusively 
among people who were in academic difficulty. (We might add that 
type III, seen at Harvard with some frequency and not character- 
istic of Hygiene cases, was the type for which Vorhaus foresaw the 
best prognosis.) The fact that her scheme could be transferred 
from New York to Cambridge and from pre-college to college stu- 
dents suggests that it possesses more widespread validity than is 
always found in methods for identifying poor students with the 
Rorschach. 

Second, both Vorhaus’ types and our extensions of her typology 
to include some ‘desirable’ academic syndromes, are based on usual 
Rorschach theory. Both typologies have been empirically shown 
to be valid and reliable. We have therefore some inferential evi- 
dence that the Rorschach, interpreted in the customary manner, 
can lead to correct conclusions about college performance. The 
limitation seems to be that only some Rorschachs point directly to 
good or poor academic performance, while other Rorschach pat- 
terns occur infrequently or are ambiguous as to their significance. 

Third, the positive results shown here point out the need of 
studies in which the Rorschachs are administered as the student 
enters college. Ours is a study after the fact. We cannot establish 
whether the test patterns we observed were long-term character- 
istics of the men tested or whether the test they gave reflected their 
recent experience with years or months of college success or failure. 
The need is to administer the test to a sample of an entering class 
and immediately make sealed predictions, to be checked after the 
student has accumulated his good or poor academic record. Our 
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present findings suggest that such a study would have positive and 
practically applicable results. 
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A COMPARISON OF THE STANFORD-BINET, 
REVISED FORM L, AND THE CALIFORNIA 
TEST OF MENTAL MATURITY (S-FORM): 


WILLIAM D. SHELDON 
Director of the Reading Laboratory, Syracuse University 


and 


GEORGE MANOLAKES 


Ball State Teachers College, Muncie, Ind. 


An investigation was made of the characteristics of good and 
poor readers in grades I—XII in eight school systems in the State 
of New York. The present article presents the results of the Stan- 
ford-Binet Intelligence Scale, Revised Form L, and the California 
Test of Mental Maturity (S-Form), administered to each child 
studied in grades I-VI. This study evolved from the need of com- 
paring the two tests in order to assess the intellectual status of the 
pupils tested in a more accurate manner. The Grace Arthur, Form 
II, was also administered to each of the pupils in the study and a 
comparison between this test and the Stanford-Binet has been 
published.? 


SELECTION OF SUBJECTS 


The subjects represent ten per cent of all the pupils in grades 
I-VI in the eight schools participating in the study. The popula- 
tion included 4220 pupils from fourteen elementary and four rural 
schools. Included in this study were four hundred and twenty-two 
pupils. They were selected by their teachers in each of the schools. 
In making their selections, the teachers used three criteria: (1) 
achievement tests in reading, administered before selection; (2) 





‘ Authors wish to acknowledge the codperation of the Bureau of School 
Services, School of Education, and the Psychological Services Center of 
Syracuse University in making this research project possible. 

? G. Manolakes and W. D. Sheldon, ‘‘A comparison of the Grace Arthur, 
Revised Form II, and the Stanford-Binet, Revised Form L’’, Journal of 
Educational and Psychological Measurement, 12, Spring 1952, pp. 105-108. 
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their own rating of each pupil’s status in reading; and (3) test 
scores derived from intelligence tests administered before the selec- 
tion. The teachers were asked to choose from their classes, five per 
cent of their pupils who were good readers and five per cent who 
were poor readers. The four hundred and twenty-two pupils tested 
ranged in ages from five years ten months to twelve years eleven 
months. 


PROCEDURE 


The Stanford-Binet and the California Test of Mental Maturity 
were administered to each of the four hundred and twenty-two 
pupils in the study by members of the research team. The con- 
ditions under which the tests were administered were the best that 
could be obtained in the various schools. The Stanford-Binet test 
was administered individually by skilled clinicians while the Cali- 
fornia Test of Mental Maturity was administered to small groups 
by individuals trained in giving this instrument. The time interval 
between the two tests ranged from one day to four months, with an 
average interval of four weeks. 


RESULTS 


Table I reveals that there is no significant difference between 
the means of the California Test of Mental Maturity and the 
Binet. This difference was measured by both the ¢ test and by criti- 
cal ratio of the difference and its standard error. A coefficient of 
correlation of .702 was obtained from a comparison of the distribu- 
tions. The stability of this statistic was determined by Fisher’s r 
to Z transformation; and at the .01 level, this correlation coefficient 
would range .629 to .757. 

Table I also reveals that the Binet distribution is skewed signifi- 
cantly. This might be expected due to the manner in which this 
group was selected. However, the results of the California Test of 
Mental Maturity were not significantly skewed. The measure of 
kurtosis for both distributions cannot be considered significantly 
different from normal. 

According to Table II, 44.6 per cent of the scores obtained from 
the same individuals differed greater than 10 IQ points. Twenty- 
four and six-tenths per cent showed a difference greater than 15 
IQ points; while 13.7 per cent of the cases showed differences 
greater than 20 points. 
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TaBLE 1.—FrequeNcy DisTRIBUTION AND CONSTANTS FOR CALIFORNIA 
Test oF Mentat Maturity (S-Form) AND STanForp-BINET, 
Revisep Form L 
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Binet IQ CTMM 
6 160-169 0 
11 150-159 4 
11 140-149 10 
28 130-139 35 
61 120-129 66 
66 110-119 68 
86 100-109 91 
76 90-99 95 
59 80-89 33 
14 70-79 12 
4 60-69 7 
0 50-59 0 
0 40-49 1 
422 M 422 
108.443 Mean 107.806 
.925 Sm .819 
Smi~m2 . 680 
CR .937 
P 347 
106. 244 Median 106.423 
18.984 S.D. 16.804 
.653 Ss .578 
Ssi~s2 .623 
CR 3.497 
P .0002 
66-169 Range 47-156 
28.135 ‘Qu-Qu 25.881 
Skewness 
491 gi .015 
119 Ser .119 
.00002 P .4433 
Kurtosis 
.0396 Ze — .0136 
. 238 Sg . 238 
.4325 P 4761 
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Correlation 
R .702 
Sr .0247 
Z r . 867 
8s .049 
z + 2.586, .741 to .993 
r range .629 to .757 








TasLe II1.—Dirrerences IN IQ Points BETWEEN THE STANFORD-BINET, 
REVISED Form L AND THE CALIFORNIA Test OF MENTAL MATURITY 














(S-Form) 
Binet Higher CTMM Higher 
IQ Difference 
per cent N N per cent 
15.2 64 1-5 65 15.4 
11.1 47 6-10 49 11.6 
10.0 42 11-15 42 10.0 
4.7 20 16-20 26 6.2 
3.6 15 21-25 9 2.1 
1.7 7 26-30 10 2.4 
1.4 6 31-35 4 0.9 
0.7 3 36—40 1 0.2 
0.5 2 41-45 0 0.0 
0.2 1 46-50 0 0.0 
49.1 207 No difference 9 cases 206 48.8 
2.1 per cent 

















The average algebraic and arithmetic differences between the 
test at the chronological age levels are indicated by Table III. The 
general trend of the average algebraic differences between the 
scores for individuals indicates that the California Test for Mental 
Maturity yields scores higher than the Binet at the six- and seven- 
year level, while the Binet was higher at ages nine, ten, and 
eleven. The scores were in greatest agreement at the eight-year 
level. The differences at ages twelve and thirteen are not considered 
as too indicative or significant because of the small number of 
students at these age levels. 

The average arithmetic differences between the IQ scores indi- 
cate an average absolute difference of 10.943 between the two 
tests. The greatest difference is in the ten-year-old group. 

In Table IV a significant trend is revealed by a comparison of 
the results of the two tests by IQ groups. The average algebraic 
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TaBLE III.—AVERAGE ALGEBRAIC AND ARITHMETIC DIFFERENCES BETWEEN 
THE STANFORD-BINET, REVISED Form L, AND THE CALIFORNIA TEST OF 
MENTAL Maturity (S-Form) aT THE CHRONOLOGICAL AGE LEVELS 











Age N Algebraic Difference* | Arithmetic Difference 
6 61 —4,197 11.527 
7 82 —1.890 9.415 
8 70 0.314 8.788 
9 71 3.606 12.535 
10 56 3.839 14.446 
11 59 3.305 10.763 
12 18 —2.778 8.278 
13 5 3.000 9.000 
Total 422 Average 0.573 10.943 














* Negative numbers indicate difference in favor of the California Test 
of Mental Maturity scores. 


differences indicate that at the upper Binet levels, the Binet scores 

are higher than those of the California Test of Mental Maturity. 

This difference decreases almost progressively until the average 

range is reached. At and below the average range, the California | 

Test of Mental Maturity yields a higher IQ. 
Regression equations have been computed to demonstrate the | 

disparity between the tests on different IQ levels. The tables 

further substantiate the differences already indicated. The re- 


TaBLeE IV.—AVERAGE ALGEBRAIC DiFreRENCES BY IQ Groups 
DETERMINED BY Binet IQ 








N Algebraic Difference* 

160-169 6 33.833 | 
150-159 11 18.273 | 
140-149 ll 13.909 
130-139 28 10.607 
120-139 61 2.787 
110-119 66 5.712 
100-109 86 —2.965 

90-99~ 76 —4.566 

80-89: 59 —6.475 

70-795 14 —9.143 

60-69 ; 4 —0.075 











* Negative numbers indicate difference in favor of the California Test ‘ 
of Mental Maturity. 
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gression formula for predicting Binet IQ’s from C.T.M.M. 1Q’s is: 


X! = .793Y + 22.953 Sx.y = 13.517 
The regression formula for predicting C.T.M.M. IQ’s from Binet 
1Q’s: Y' = 621X + 40.463 Syx = 11.964. 
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